Enrichr - a gene set enrichment analysis web server: Website, Paper
enrichR - an R interface to all ‘Enrichr’ databases: CRAN
Demo Dataset: E-MTAB-8411 from The clock gene Bmal1 inhibits macrophage motility, phagocytosis, and impairs defense against pneumonia. PNAS. 2020;117(3):1543-1551.
License: GPL-3.0
Rcd /ngs/GO-Enrichment-Analysis-Demo
R
If you have downloaded the DESeq2_DEG.txt file with wget:
If you like to donwload the file in R now:
data <- data.table::fread("https://raw.githubusercontent.com/ycl6/GO-Enrichment-Analysis-Demo/master/DESeq2_DEG.txt")
data$GeneID <- substr(data$GeneID, 1, 18)## GeneID GeneSymbol log2fc pvalue padj
## 1: ENSMUSG00000000001 Gnai3 0.08804493 0.1925609 0.6146732
## 2: ENSMUSG00000000003 Pbsn NA NA NA
## 3: ENSMUSG00000000028 Cdc45 -0.32106635 0.1401127 0.5331437
## 4: ENSMUSG00000000031 H19 -1.20339889 0.7161464 NA
## 5: ENSMUSG00000000037 Scml2 -0.57746426 0.2979159 NA
## ---
## 55381: ENSMUSG00000118636 AC117663.3 NA NA NA
## 55382: ENSMUSG00000118637 AL772212.1 NA NA NA
## 55383: ENSMUSG00000118638 AL805980.1 NA NA NA
## 55384: ENSMUSG00000118639 AL590997.4 NA NA NA
## 55385: ENSMUSG00000118640 AC167036.2 0.06415390 0.9208414 NA
up.idx <- which(data$padj < 0.05 & data$log2fc > 0) # FDR < 0.05 and logFC > 0
dn.idx <- which(data$padj < 0.05 & data$log2fc < 0) # FDR < 0.05 and logFC < 0## [1] 55385 5
## [1] 383
## [1] 429
## [1] "Axin2" "Hnrnpd" "Kcnn3" "Mapk7" "Agpat3" "Sema6b" "Efnb2" "Il16" "Ltbp1"
## [10] "Rgs19"
## [1] "Cox5a" "Pdgfb" "Itga5" "Cd52" "Dnmt3l" "Tubb6" "Ell2" "Ifrd1" "Stk38l"
## [10] "Ubl3"
Alternatively, if you only have Ensembl gene ID
## [1] "ENSMUSG00000000142" "ENSMUSG00000000568" "ENSMUSG00000000794" "ENSMUSG00000001034"
## [5] "ENSMUSG00000001211" "ENSMUSG00000001227" "ENSMUSG00000001300" "ENSMUSG00000001741"
## [9] "ENSMUSG00000001870" "ENSMUSG00000002458"
## [1] "ENSMUSG00000000088" "ENSMUSG00000000489" "ENSMUSG00000000555" "ENSMUSG00000000682"
## [5] "ENSMUSG00000000730" "ENSMUSG00000001473" "ENSMUSG00000001542" "ENSMUSG00000001627"
## [9] "ENSMUSG00000001630" "ENSMUSG00000001687"
We would need to convert any other identifier format to SYMBOL which is the required input identifier format. This can be done by using the select function from AnnotationDbi that we saw in Part 1 of this demo, or by using the “Biological Id TRanslator” bitr function from clusterProfiler which is a wrapper function of AnnotationDbi::select.
Here, we will use bitr here to see how this can be done.
# Use fromType = "ENSEMBL" if your input identifier is Ensembl gene ID
up.genes.df = clusterProfiler::bitr(up.genes, fromType = "ENSEMBL", toType = "SYMBOL", OrgDb = "org.Mm.eg.db")##
## Loading required package: org.Mm.eg.db
## Loading required package: AnnotationDbi
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport,
## clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply,
## parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname,
## do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect,
## is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int,
## pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff,
## sort, table, tapply, union, unique, unsplit, which, which.max, which.min
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with 'browseVignettes()'. To
## cite Bioconductor, see 'citation("Biobase")', and for packages
## 'citation("pkgname")'.
## Loading required package: IRanges
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
##
## expand.grid
##
## 'select()' returned 1:many mapping between keys and columns
## ENSEMBL SYMBOL
## 1 ENSMUSG00000000142 Axin2
## 2 ENSMUSG00000000568 Hnrnpd
## 3 ENSMUSG00000000794 Kcnn3
## 4 ENSMUSG00000001034 Mapk7
## 5 ENSMUSG00000001211 Agpat3
## 6 ENSMUSG00000001227 Sema6b
## 7 ENSMUSG00000001300 Efnb2
## 8 ENSMUSG00000001741 Il16
## 9 ENSMUSG00000001870 Ltbp1
## 10 ENSMUSG00000002458 Rgs19
dn.genes.df = clusterProfiler::bitr(dn.genes, fromType = "ENSEMBL", toType = "SYMBOL", OrgDb = "org.Mm.eg.db")## 'select()' returned 1:many mapping between keys and columns
## ENSEMBL SYMBOL
## 1 ENSMUSG00000000088 Cox5a
## 2 ENSMUSG00000000489 Pdgfb
## 3 ENSMUSG00000000555 Itga5
## 4 ENSMUSG00000000682 Cd52
## 5 ENSMUSG00000000730 Dnmt3l
## 6 ENSMUSG00000001473 Tubb6
## 7 ENSMUSG00000001542 Ell2
## 8 ENSMUSG00000001627 Ifrd1
## 9 ENSMUSG00000001630 Stk38l
## 10 ENSMUSG00000001687 Ubl3
List available databases from Enrichr
## [1] "data.frame"
## [1] 167 5
## geneCoverage genesPerTerm libraryName
## 56 4271 128 Achilles_fitness_decrease
## 55 4320 129 Achilles_fitness_increase
## 82 16129 292 Aging_Perturbations_from_GEO_down
## 83 15309 308 Aging_Perturbations_from_GEO_up
## 53 13877 304 Allen_Brain_Atlas_down
## 49 13121 305 Allen_Brain_Atlas_up
## link numTerms
## 56 http://www.broadinstitute.org/achilles 216
## 55 http://www.broadinstitute.org/achilles 216
## 82 http://www.ncbi.nlm.nih.gov/geo/ 286
## 83 http://www.ncbi.nlm.nih.gov/geo/ 286
## 53 http://www.brain-map.org/ 2192
## 49 http://www.brain-map.org/ 2192
Show all database names.
## [1] "Achilles_fitness_decrease"
## [2] "Achilles_fitness_increase"
## [3] "Aging_Perturbations_from_GEO_down"
## [4] "Aging_Perturbations_from_GEO_up"
## [5] "Allen_Brain_Atlas_down"
## [6] "Allen_Brain_Atlas_up"
## [7] "ARCHS4_Cell-lines"
## [8] "ARCHS4_IDG_Coexp"
## [9] "ARCHS4_Kinases_Coexp"
## [10] "ARCHS4_TFs_Coexp"
## [11] "ARCHS4_Tissues"
## [12] "BioCarta_2013"
## [13] "BioCarta_2015"
## [14] "BioCarta_2016"
## [15] "BioPlanet_2019"
## [16] "BioPlex_2017"
## [17] "Cancer_Cell_Line_Encyclopedia"
## [18] "CCLE_Proteomics_2020"
## [19] "ChEA_2013"
## [20] "ChEA_2015"
## [21] "ChEA_2016"
## [22] "Chromosome_Location"
## [23] "Chromosome_Location_hg19"
## [24] "ClinVar_2019"
## [25] "CORUM"
## [26] "COVID-19_Related_Gene_Sets"
## [27] "Data_Acquisition_Method_Most_Popular_Genes"
## [28] "dbGaP"
## [29] "DepMap_WG_CRISPR_Screens_Broad_CellLines_2019"
## [30] "DepMap_WG_CRISPR_Screens_Sanger_CellLines_2019"
## [31] "Disease_Perturbations_from_GEO_down"
## [32] "Disease_Perturbations_from_GEO_up"
## [33] "Disease_Signatures_from_GEO_down_2014"
## [34] "Disease_Signatures_from_GEO_up_2014"
## [35] "DisGeNET"
## [36] "Drug_Perturbations_from_GEO_2014"
## [37] "Drug_Perturbations_from_GEO_down"
## [38] "Drug_Perturbations_from_GEO_up"
## [39] "DrugMatrix"
## [40] "DSigDB"
## [41] "Elsevier_Pathway_Collection"
## [42] "ENCODE_and_ChEA_Consensus_TFs_from_ChIP-X"
## [43] "ENCODE_Histone_Modifications_2013"
## [44] "ENCODE_Histone_Modifications_2015"
## [45] "ENCODE_TF_ChIP-seq_2014"
## [46] "ENCODE_TF_ChIP-seq_2015"
## [47] "Enrichr_Libraries_Most_Popular_Genes"
## [48] "Enrichr_Submissions_TF-Gene_Coocurrence"
## [49] "Epigenomics_Roadmap_HM_ChIP-seq"
## [50] "ESCAPE"
## [51] "Gene_Perturbations_from_GEO_down"
## [52] "Gene_Perturbations_from_GEO_up"
## [53] "Genes_Associated_with_NIH_Grants"
## [54] "GeneSigDB"
## [55] "Genome_Browser_PWMs"
## [56] "GO_Biological_Process_2013"
## [57] "GO_Biological_Process_2015"
## [58] "GO_Biological_Process_2017"
## [59] "GO_Biological_Process_2017b"
## [60] "GO_Biological_Process_2018"
## [61] "GO_Cellular_Component_2013"
## [62] "GO_Cellular_Component_2015"
## [63] "GO_Cellular_Component_2017"
## [64] "GO_Cellular_Component_2017b"
## [65] "GO_Cellular_Component_2018"
## [66] "GO_Molecular_Function_2013"
## [67] "GO_Molecular_Function_2015"
## [68] "GO_Molecular_Function_2017"
## [69] "GO_Molecular_Function_2017b"
## [70] "GO_Molecular_Function_2018"
## [71] "GTEx_Tissue_Sample_Gene_Expression_Profiles_down"
## [72] "GTEx_Tissue_Sample_Gene_Expression_Profiles_up"
## [73] "GWAS_Catalog_2019"
## [74] "HMDB_Metabolites"
## [75] "HMS_LINCS_KinomeScan"
## [76] "HomoloGene"
## [77] "Human_Gene_Atlas"
## [78] "Human_Phenotype_Ontology"
## [79] "HumanCyc_2015"
## [80] "HumanCyc_2016"
## [81] "huMAP"
## [82] "InterPro_Domains_2019"
## [83] "Jensen_COMPARTMENTS"
## [84] "Jensen_DISEASES"
## [85] "Jensen_TISSUES"
## [86] "KEA_2013"
## [87] "KEA_2015"
## [88] "KEGG_2013"
## [89] "KEGG_2015"
## [90] "KEGG_2016"
## [91] "KEGG_2019_Human"
## [92] "KEGG_2019_Mouse"
## [93] "Kinase_Perturbations_from_GEO_down"
## [94] "Kinase_Perturbations_from_GEO_up"
## [95] "L1000_Kinase_and_GPCR_Perturbations_down"
## [96] "L1000_Kinase_and_GPCR_Perturbations_up"
## [97] "Ligand_Perturbations_from_GEO_down"
## [98] "Ligand_Perturbations_from_GEO_up"
## [99] "LINCS_L1000_Chem_Pert_down"
## [100] "LINCS_L1000_Chem_Pert_up"
## [101] "LINCS_L1000_Ligand_Perturbations_down"
## [102] "LINCS_L1000_Ligand_Perturbations_up"
## [103] "lncHUB_lncRNA_Co-Expression"
## [104] "MCF7_Perturbations_from_GEO_down"
## [105] "MCF7_Perturbations_from_GEO_up"
## [106] "MGI_Mammalian_Phenotype_2013"
## [107] "MGI_Mammalian_Phenotype_2017"
## [108] "MGI_Mammalian_Phenotype_Level_3"
## [109] "MGI_Mammalian_Phenotype_Level_4"
## [110] "MGI_Mammalian_Phenotype_Level_4_2019"
## [111] "Microbe_Perturbations_from_GEO_down"
## [112] "Microbe_Perturbations_from_GEO_up"
## [113] "miRTarBase_2017"
## [114] "Mouse_Gene_Atlas"
## [115] "MSigDB_Computational"
## [116] "MSigDB_Oncogenic_Signatures"
## [117] "NCI-60_Cancer_Cell_Lines"
## [118] "NCI-Nature_2015"
## [119] "NCI-Nature_2016"
## [120] "NIH_Funded_PIs_2017_AutoRIF_ARCHS4_Predictions"
## [121] "NIH_Funded_PIs_2017_GeneRIF_ARCHS4_Predictions"
## [122] "NIH_Funded_PIs_2017_Human_AutoRIF"
## [123] "NIH_Funded_PIs_2017_Human_GeneRIF"
## [124] "NURSA_Human_Endogenous_Complexome"
## [125] "Old_CMAP_down"
## [126] "Old_CMAP_up"
## [127] "OMIM_Disease"
## [128] "OMIM_Expanded"
## [129] "Panther_2015"
## [130] "Panther_2016"
## [131] "Pfam_Domains_2019"
## [132] "Pfam_InterPro_Domains"
## [133] "PheWeb_2019"
## [134] "Phosphatase_Substrates_from_DEPOD"
## [135] "PPI_Hub_Proteins"
## [136] "ProteomicsDB_2020"
## [137] "Rare_Diseases_AutoRIF_ARCHS4_Predictions"
## [138] "Rare_Diseases_AutoRIF_Gene_Lists"
## [139] "Rare_Diseases_GeneRIF_ARCHS4_Predictions"
## [140] "Rare_Diseases_GeneRIF_Gene_Lists"
## [141] "Reactome_2013"
## [142] "Reactome_2015"
## [143] "Reactome_2016"
## [144] "RNA-Seq_Disease_Gene_and_Drug_Signatures_from_GEO"
## [145] "SILAC_Phosphoproteomics"
## [146] "SubCell_BarCode"
## [147] "SysMyo_Muscle_Gene_Sets"
## [148] "Table_Mining_of_CRISPR_Studies"
## [149] "TargetScan_microRNA"
## [150] "TargetScan_microRNA_2017"
## [151] "TF_Perturbations_Followed_by_Expression"
## [152] "TF-LOF_Expression_from_GEO"
## [153] "Tissue_Protein_Expression_from_Human_Proteome_Map"
## [154] "Tissue_Protein_Expression_from_ProteomicsDB"
## [155] "Transcription_Factor_PPIs"
## [156] "TRANSFAC_and_JASPAR_PWMs"
## [157] "TRRUST_Transcription_Factors_2019"
## [158] "UK_Biobank_GWAS_v1"
## [159] "Virus_Perturbations_from_GEO_down"
## [160] "Virus_Perturbations_from_GEO_up"
## [161] "Virus-Host_PPI_P-HIPSTer_2020"
## [162] "VirusMINT"
## [163] "WikiPathways_2013"
## [164] "WikiPathways_2015"
## [165] "WikiPathways_2016"
## [166] "WikiPathways_2019_Human"
## [167] "WikiPathways_2019_Mouse"
Search for mouse databases with keyword "Mouse"
## [1] "KEGG_2019_Mouse" "Mouse_Gene_Atlas" "WikiPathways_2019_Mouse"
## Uploading data to Enrichr... Done.
## Querying GO_Molecular_Function_2018... Done.
## Querying GO_Cellular_Component_2018... Done.
## Querying GO_Biological_Process_2018... Done.
## Parsing results... Done.
## Uploading data to Enrichr... Done.
## Querying GO_Molecular_Function_2018... Done.
## Querying GO_Cellular_Component_2018... Done.
## Querying GO_Biological_Process_2018... Done.
## Parsing results... Done.
## [1] "list"
## [1] "GO_Molecular_Function_2018" "GO_Cellular_Component_2018" "GO_Biological_Process_2018"
## Term Overlap
## 1 GTPase activator activity (GO:0005096) 15/249
## 2 GTPase regulator activity (GO:0030695) 15/275
## 3 peptidoglycan binding (GO:0042834) 3/16
## 4 Rho guanyl-nucleotide exchange factor activity (GO:0005089) 5/59
## 5 sodium channel inhibitor activity (GO:0019871) 2/6
## 6 beta-galactoside (CMP) alpha-2,3-sialyltransferase activity (GO:0003836) 2/7
## 7 phosphotransferase activity, alcohol group as acceptor (GO:0016773) 11/254
## 8 UDP-glucosyltransferase activity (GO:0035251) 2/9
## 9 amyloid-beta binding (GO:0001540) 4/49
## 10 diacylglycerol kinase activity (GO:0004143) 2/10
## P.value Adjusted.P.value Old.P.value Old.Adjusted.P.value Odds.Ratio
## 1 0.00006472282 0.07449596 0 0 3.265093
## 2 0.00019444229 0.11190154 0 0 2.956393
## 3 0.00291797886 1.00000000 0 0 10.162602
## 4 0.00460731779 1.00000000 0 0 4.593266
## 5 0.00484825127 1.00000000 0 0 18.066847
## 6 0.00670502310 1.00000000 0 0 15.485869
## 7 0.00780271435 1.00000000 0 0 2.347268
## 8 0.01121713722 1.00000000 0 0 12.044565
## 9 0.01257415028 1.00000000 0 0 4.424534
## 10 0.01385171595 1.00000000 0 0 10.840108
## Combined.Score
## 1 31.49312
## 2 25.26349
## 3 59.31772
## 4 24.71228
## 5 96.28071
## 6 77.50520
## 7 11.39196
## 8 54.08386
## 9 19.36226
## 10 46.38858
## Genes
## 1 VAV3;RAP1GAP2;RGS18;STARD13;DOCK4;TBC1D9B;RGS19;RGS14;DAB2IP;AXIN2;ARHGAP12;ARHGAP32;CDC42EP3;RGS11;EVI5
## 2 VAV3;RAP1GAP2;RGS18;STARD13;DOCK4;TBC1D9B;RGS19;RGS14;DAB2IP;AXIN2;ARHGAP12;ARHGAP32;CDC42EP3;RGS11;EVI5
## 3 NOD1;NOD2;PGLYRP1
## 4 VAV3;TIAM2;NET1;FARP1;ITSN1
## 5 NEDD4;SCN1B
## 6 ST3GAL4;ST3GAL5
## 7 DGKG;RASSF2;HYKK;PKDCC;EEF2K;DGKA;STK26;TNIK;PRKG1;PDK1;KHK
## 8 UGCG;GYS1
## 9 MSR1;FZD4;APBB1;SORL1
## 10 DGKG;DGKA
## Uploading data to Enrichr... Done.
## Querying KEGG_2019_Mouse... Done.
## Querying WikiPathways_2019_Mouse... Done.
## Querying BioPlanet_2019... Done.
## Parsing results... Done.
## Uploading data to Enrichr... Done.
## Querying KEGG_2019_Mouse... Done.
## Querying WikiPathways_2019_Mouse... Done.
## Querying BioPlanet_2019... Done.
## Parsing results... Done.
## [1] "list"
## [1] "KEGG_2019_Mouse" "WikiPathways_2019_Mouse" "BioPlanet_2019"
## Term Overlap P.value
## 1 Arrhythmogenic right ventricular cardiomyopathy (ARVC) 7/72 0.0003616486
## 2 Rap1 signaling pathway 12/209 0.0005298340
## 3 Complement and coagulation cascades 7/88 0.0012125395
## 4 Dilated cardiomyopathy (DCM) 7/90 0.0013830696
## 5 Transcriptional misregulation in cancer 10/183 0.0021834417
## 6 Propanoate metabolism 4/31 0.0024193336
## 7 Other glycan degradation 3/18 0.0041373981
## 8 Basal cell carcinoma 5/63 0.0060940255
## 9 Regulation of actin cytoskeleton 10/217 0.0072377365
## 10 Lysosome 7/124 0.0082086689
## Adjusted.P.value Old.P.value Old.Adjusted.P.value Odds.Ratio Combined.Score
## 1 0.10957952 0 0 5.269497 41.75991
## 2 0.08026986 0 0 3.111993 23.47360
## 3 0.12246649 0 0 4.311407 28.95126
## 4 0.10476752 0 0 4.215598 27.75318
## 5 0.13231657 0 0 2.961778 18.14638
## 6 0.12217635 0 0 6.993618 42.13140
## 7 0.17909023 0 0 9.033424 49.57261
## 8 0.23081122 0 0 4.301630 21.94023
## 9 0.24367046 0 0 2.497721 12.30988
## 10 0.24872267 0 0 3.059708 14.69445
## Genes
## 1 ITGB1;TCF7L2;CACNB3;TCF7L1;ITGA4;ITGA6;ITGA9
## 2 ITGB1;DOCK4;SIPA1L1;ITGAM;RGS14;FLT4;ADCY3;LCP2;RASGRP2;ADCY6;BCAR1;FGFR1
## 3 C4B;THBD;ITGAM;SERPINB2;CFH;CD55;F5
## 4 ITGB1;CACNB3;ITGA4;ADCY3;ITGA6;ADCY6;ITGA9
## 5 PER2;SMAD1;LYL1;MAF;ITGAM;ZEB1;NUPR1;MMP9;KLF3;PBX1
## 6 LDHA;ACACA;HIBCH;ACAT1
## 7 GLB1;GBA2;ENGASE
## 8 TCF7L2;TCF7L1;BMP2;FZD4;AXIN2
## 9 VAV3;ENAH;ITGB1;ITGAM;ITGA4;ITGA6;SSH2;BCAR1;FGFR1;ITGA9
## 10 PLA2G15;GALNS;LAPTM4B;CTSL;GLB1;ARSG;LGMN
## Uploading data to Enrichr... Done.
## Querying PheWeb_2019... Done.
## Querying ClinVar_2019... Done.
## Parsing results... Done.
## Uploading data to Enrichr... Done.
## Querying PheWeb_2019... Done.
## Querying ClinVar_2019... Done.
## Parsing results... Done.
## [1] "list"
## [1] "PheWeb_2019" "ClinVar_2019"
## Term Overlap P.value Adjusted.P.value Old.P.value
## 1 Difficulty in walking 4/21 0.0005320936 0.6177606 0
## 2 Circulatory disease NEC 3/24 0.0094582687 1.0000000 0
## 3 Eustachian tube disorders 2/12 0.0198281761 1.0000000 0
## 4 Hypercoagulable state 3/32 0.0208044238 1.0000000 0
## 5 Hypercholesterolemia 3/35 0.0263683143 1.0000000 0
## 6 Primary hypercoagulable state 3/35 0.0263683143 1.0000000 0
## 7 Acute sinusitis 2/14 0.0266846266 1.0000000 0
## 8 Costochondritis 2/14 0.0266846266 1.0000000 0
## 9 Other venous embolism and thrombosis 3/36 0.0283823921 1.0000000 0
## 10 Bipolar 2/15 0.0304201943 1.0000000 0
## Old.Adjusted.P.value Odds.Ratio Combined.Score Genes
## 1 0 10.323913 77.82879 SLC12A5;AQP9;MTUS1;MMP9
## 2 0 6.775068 31.57768 KLHL33;HIBCH;TTC28
## 3 0 9.033424 35.41690 EPB41L3;ANO6
## 4 0 5.081301 19.67779 SELP;PRKG1;F5
## 5 0 4.645761 16.89009 ZCCHC24;CADM1;CDC42EP3
## 6 0 4.645761 16.89009 SELP;PRKG1;F5
## 7 0 7.742935 28.05782 PID1;LIFR
## 8 0 7.742935 28.05782 LTBP1;PRKG1
## 9 0 4.516712 16.08847 RWDD3;MCTP1;F5
## 10 0 7.226739 25.24046 OGFRL1;AKAP7
.enrichment_prep_df function.enrichment_prep_df <- function(df, showTerms, orderBy) {
if(is.null(showTerms)) {
showTerms = nrow(df)
} else if(!is.numeric(showTerms)) {
stop(paste0("showTerms '", showTerms, "' is invalid."))
}
Annotated <- as.numeric(sub("^\\d+/", "", as.character(df$Overlap)))
Significant <- as.numeric(sub("/\\d+$", "", as.character(df$Overlap)))
# Build data frame
df <- cbind(df, data.frame(Annotated = Annotated, Significant = Significant,
stringsAsFactors = FALSE))
# Order data frame (P.value or Combined.Score)
if(orderBy == "Combined.Score") {
idx <- order(df$Combined.Score, decreasing = TRUE)
} else {
idx <- order(df$P.value, decreasing = FALSE)
}
df <- df[idx,]
# Subset to selected number of terms
if(showTerms <= nrow(df)) {
df <- df[1:showTerms,]
}
return(df)
}plotEnrich functionHere, we will use a plotEnrich function to visualise Enrichr results as bar plots.
plotEnrich <- function(df, showTerms = 20, numChar = 40, y = "Count", orderBy = "P.value",
xlab = NULL, ylab = NULL, title = NULL) {
if(!is.numeric(numChar)) {
stop(paste0("numChar '", numChar, "' is invalid."))
}
df <- .enrichment_prep_df(df, showTerms, orderBy)
# Create trimmed name (as seen in topGO)
shortName <- paste(substr(df$Term, 1, numChar),
ifelse(nchar(df$Term) > numChar, '...', ''), sep = '')
df$shortName = shortName
df$shortName <- factor(df$shortName, levels = rev(unique(df$shortName)))
df$Ratio <- df$Significant/df$Annotated
# Define fill variable (P.value or Combined.Score)
if(orderBy == "Combined.Score") {
fill <- "Combined.Score"
} else {
fill <- "P.value"
}
# Define y variable (Count or Ratio)
if(y != "Ratio") {
y <- "Significant"
}
# Define variable mapping
map <- aes_string(x = "shortName", y = y, fill = fill)
# Define labels
if(is.null(xlab)) {
xlab <- "Enriched terms"
}
if(is.null(ylab)) {
if(y == "Ratio") {
ylab <- "Gene ratio"
} else {
ylab <- "Gene count"
}
}
if(is.null(title)) {
title <- "Enrichment analysis by Enrichr"
}
# Make the ggplot
p <- ggplot(df, map) + geom_bar(stat = "identity") + coord_flip() + theme_bw()
if(orderBy == "Combined.Score") {
p <- p + scale_fill_continuous(low = "blue", high = "red") +
guides(fill = guide_colorbar(title = "Combined Score", reverse = FALSE))
} else {
p <- p + scale_fill_continuous(low = "red", high = "blue") +
guides(fill = guide_colorbar(title = "P value", reverse = TRUE))
}
# Adjust theme components
p <- p + theme(axis.text.x = element_text(colour = "black", vjust = 1),
axis.text.y = element_text(colour = "black", hjust = 1),
axis.title = element_text(color = "black", margin = margin(10, 5, 0, 0)),
axis.title.y = element_text(angle = 90))
p <- p + xlab(xlab) + ylab(ylab) + ggtitle(title)
return(p)
}printEnrich functionHere, we will use a printEnrich function to output Enrichr results to text files.
printEnrich <- function(data, prefix = "enrichr", showTerms = NULL, columns = c(1:9)) {
if(!is.numeric(columns)) {
stop(paste0("columns '", columns, "' is invalid."))
}
for (i in 1:length(data)) {
dbname <- names(data)[i]
df <- data[[i]]
df <- .enrichment_prep_df(df, showTerms, orderBy = "P.value")
df <- df[, !colnames(df) %in% c("Annotated", "Significant")]
if(any(columns > ncol(df))) {
stop("Undefined columns selected")
}
filename <- paste0(prefix, "_", dbname, ".txt")
write.table(df, file = filename, sep = "\t", quote = F, row.names = F, col.names = T)
}
}Demonstrate using different paramters to plot enrichment.
plotEnrich(upEnriched_dd[[2]], showTerms = 10, numChar = 30, y = "Count", orderBy = "Combined.Score")printEnrich(upEnriched_go, prefix = "enrichr-GO-up", showTerms = 20)
printEnrich(dnEnriched_go, prefix = "enrichr-GO-dn", showTerms = 20)
printEnrich(upEnriched_pw, prefix = "enrichr-PW-up", showTerms = 20)
printEnrich(dnEnriched_pw, prefix = "enrichr-PW-dn", showTerms = 20)
printEnrich(upEnriched_dd, prefix = "enrichr-DD-up", showTerms = 20)
printEnrich(dnEnriched_dd, prefix = "enrichr-DD-dn", showTerms = 20)## R version 4.0.2 (2020-06-22)
## Platform: x86_64-conda_cos6-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
##
## Matrix products: default
## BLAS/LAPACK: /home/ihsuan/miniconda3/envs/r4/lib/libopenblasp-r0.3.10.so
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8
## [4] LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets methods
## [9] base
##
## other attached packages:
## [1] org.Mm.eg.db_3.11.4 AnnotationDbi_1.50.3 IRanges_2.22.2 S4Vectors_0.26.1
## [5] Biobase_2.48.0 BiocGenerics_0.34.0 ggplot2_3.3.2 enrichR_2.1
## [9] knitr_1.29
##
## loaded via a namespace (and not attached):
## [1] enrichplot_1.8.1 bit64_4.0.2 RColorBrewer_1.1-2
## [4] progress_1.2.2 httr_1.4.2 tools_4.0.2
## [7] R6_2.4.1 DBI_1.1.0 colorspace_1.4-1
## [10] withr_2.2.0 tidyselect_1.1.0 gridExtra_2.3
## [13] prettyunits_1.1.1 bit_4.0.4 curl_4.3
## [16] compiler_4.0.2 scatterpie_0.1.4 xml2_1.3.2
## [19] labeling_0.3 triebeard_0.3.0 scales_1.1.1
## [22] ggridges_0.5.2 stringr_1.4.0 digest_0.6.25
## [25] rmarkdown_2.3 DOSE_3.14.0 pkgconfig_2.0.3
## [28] htmltools_0.5.0 rlang_0.4.7 RSQLite_2.2.0
## [31] gridGraphics_0.5-0 farver_2.0.3 generics_0.0.2
## [34] jsonlite_1.7.0 BiocParallel_1.22.0 GOSemSim_2.14.1
## [37] dplyr_1.0.1 magrittr_1.5 ggplotify_0.0.5
## [40] GO.db_3.11.4 Matrix_1.2-18 Rcpp_1.0.5
## [43] munsell_0.5.0 viridis_0.5.1 lifecycle_0.2.0
## [46] stringi_1.4.6 yaml_2.2.1 ggraph_2.0.3
## [49] MASS_7.3-51.6 plyr_1.8.6 qvalue_2.20.0
## [52] grid_4.0.2 blob_1.2.1 ggrepel_0.8.2
## [55] DO.db_2.9 crayon_1.3.4 lattice_0.20-41
## [58] graphlayouts_0.7.0 cowplot_1.0.0 splines_4.0.2
## [61] hms_0.5.3 pillar_1.4.6 fgsea_1.14.0
## [64] igraph_1.2.5 rjson_0.2.20 reshape2_1.4.4
## [67] fastmatch_1.1-0 glue_1.4.1 evaluate_0.14
## [70] downloader_0.4 BiocManager_1.30.10 data.table_1.13.0
## [73] vctrs_0.3.2 tweenr_1.0.1 urltools_1.7.3
## [76] gtable_0.3.0 purrr_0.3.4 polyclip_1.10-0
## [79] tidyr_1.1.1 xfun_0.16 ggforce_0.3.2
## [82] europepmc_0.4 tidygraph_1.2.0 viridisLite_0.3.0
## [85] tibble_3.0.3 rvcheck_0.1.8 clusterProfiler_3.16.0
## [88] memoise_1.1.0 ellipsis_0.3.1